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^ • Abstract 

■ This paper provides a new conceptual perspective on survey propagation, which is an iter- 
I ative algorithm recently introduced by the statistical physics community that is very effective 

■ in solving random fc-SAT problems even with densities close to the satisfiability threshold. We 
I first describe how any SAT formula can be associated with a novel family of Markov random 

I— !■ fields (MRFs), parameterized by a real number p e [0,1]. We then show that applying belief 

' propagation — a well-known "message-passing" technique for estimating marginal probabilities — 

^ . to this family of MRFs recovers a known family of algorithms, ranging from pure survey prop- 

^ I agation at one extreme (p = 1) to standard belief propagation on the uniform distribution over 

O ■ SAT assignments at the other extreme {p = 0). Configurations in these MRFs have a natural 
interpretation as partial satisfiability assignments, on which a partial order can be defined. We 

. isolate cores as minimal elements in this partial ordering, which arc also fixed points of survey 

^ ' propagation and the only assignments with positive probability in the MRF for p = I. Our ex- 

(N : perimental results for fc = 3 suggest that solutions of random formulas typically do not possess 

I non-trivial cores. This makes it necessary to study the structure of the space of partial assign- 

Q.^ . ments for p < 1 and investigate the role of assignments that are very close to being cores. To 

I that end, we investigate the associated lattice structure, and prove a weight-preserving identity 

. that shows how any MRF with p > can be viewed as a "smoothed" version of the uniform 

I distribution over satisfying assignments {p = 0). Finally, we isolate properties of Gibbs sampling 

c/2 . and message-passing algorithms that are typical for an ensemble of fc-SAT problems. 
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1 Introduction 



Constraint satisfaction problems play an important role across a broad spectrum of computer 
science, including computational complexity theory _9!, coding theory and artificial intelli- 

gence j34| 114). Important but challenging problems include devising efficient algorithms for finding 
satisfying assignments (when the problem is indeed satisfiable) , or conversely providing a certifi- 
cate of unsatisfiability. One of the best-known examples of a constraint satisfaction problem is the 
A;-SAT problem, which is a classical NP complete problem for all A; > 3. In trying to understand 
the origin of its hardness, a great deal of research has been devoted to the properties of formulas 
drawn from different probability distributions. One of the most natural models for random fc-SAT 
problems is the following: for a fixed density parameter a > 0, choose m = an clauses uniformly 
and with replacement from the set of all A;-clauses on n variables. Despite its simplicity, many 
essential properties of this model are yet to be understood: in particular, the hardness of deciding 
if a random formula is satisfiable and finding a satisfying assignment for a random formula are both 
major open problems |251 1421 [TB] . 

One of the most exciting recent developments in satisfiability problems has its origins not in 
computer science, but rather in statistical physics. More specifically, the ground-breaking con- 
tribution of Mezard, Parisi and Zecchina |28j . as described in an article published in "Science", 
is the development of a new algorithm for solving fc-SAT problems. A particularly dramatic fea- 
ture of this method, known as survey propagation, is that it appears to remain effective at solving 
very large instances of random fc-SAT problems — -even with densities very close to the satisfia- 
bility threshold, a regime where other "local" algorithms (e.g., the WSAT method j37) ) typically 
fail. Given this remarkable behavior, the survey propagation algorithm has generated a great deal 
of excitement and follow-up work in both the statistical physics and computer science communi- 
ties [e.g.,ElEllIllSll21IS21ISSll^- Nonetheless, despite the considerable progress to date, the reasons 
underlying the remarkable performance of survey propagation are not yet fully understood. 

1.1 Our contributions 

This paper provides a novel conceptual perspective on survey propagation — one that not only sheds 
light on the reasons underlying its success, but also places it within a broader framework of related 
"message-passing" algorithms that are widely used in different branches of computer science. More 
precisely, by introducing a new family of Markov random fields (MRFs) that are associated with 
any fc-SAT problem, we show how a range of algorithms — including survey propagation as a special 
case — can all be recovered as instances of the well-known belief propagation algorithm j^l], as 
applied to suitably restricted MRFs within this family. This equivalence is important because belief 
propagation is a message-passing algorithm — widely used and studied in various areas, including 
coding theory [SHI 123 IS]) computer vision ^3^2 ^-^^d artificial intelligence IHHIISI — ^for computing 
approximations to marginal distributions in Markov random fields. Moreover, this equivalence 
motivates a deeper study of the combinatorial properties of the family of extended MRFs associated 
with survey propagation. Indeed, one of the main contributions of our work is to reveal the 
combinatorial structures underlying the survey propagation algorithm. 

The configurations in our extended MRFs turn out to have a natural interpretation as particular 
types of partial SAT assignments, in which a subset of variables are assigned or 1 variables in 
such a way that the remaining formula does not contain any empty or unit clauses. To provide 
some geometrical intuition for our results, it is convenient to picture these partial assignments as 
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Figure 1. The set of fully assigned satisfying configurations occupy the top plane, and are arranged 
into clusters. Enlarging to the space of partial assignments leads to a new space with better connec- 
tiviity. Minimal elements in the partial ordering are known as cores. Each core corresponds to one 
or more clusters of solutions from the top plane. In this example, one of the clusters has as a core a 
non-trivial partial assignment, whereas the others are connected to the all-* assignment. 



arranged in layers depending on the number of assigned variables, so that the top layer consists of 
fully assigned satisfying configurations. Figure ^ provides an idealized illustration of the space of 
partial assignments viewed in this manner. It is argued [291 1321 |2j that for random formulas with 
high density of clauses, the set of fully assigned configurations are separated into disjoint clusters 
that cause local message-passing algorithms like belief propagation to break down (see Figure [21 
for an illustration). Based on our results, the introduction of partial SAT assignments yields a 
modified search space that is far less fragmented, thereby permitting a local algorithm like belief 
propagation to find solutions. 

We show that there is a natural partial ordering associated with this enlarged space, and we refer 
to minimal elements in this partial ordering as cores. We prove that any core is a fixed point of the 
pure form of survey propagation {p = 1). This fact indicates that each core represents a summary 
of one cluster of solutions. However, our experimental results for A; = 3 indicate that the solutions 
of random formulas typically have trivial cores (i.e., the empty assignment). This observation 
motivates deeper study of the full family of Markov random fields for the range < p < 1, as well 
as the associated belief propagation algorithms, which we denote by SP(p). Accordingly, we study 
the lattice structure of the partial assignments, and prove a combinatorial identity that reveals 
how the distribution for p € (0, 1) can be viewed as a "smoothed" version of the MRF with p = 0. 
Our experimental results on the SP(/9) algorithms indicate that they are most effective for values 
of p close to and not necessarily equal to 1. One intriguing possibility is that the effectiveness of 
pure survey propagation (i.e., SP(1)) may be a by-product of the fact that SP(/9) is most effective 
for values of p less than 1, but going to 1 as n goes to infinity. The near-core assignments which 
are the ones of maximum weight in this case, may correspond to quasi-solutions of the cavity 
equations, as defined by Parisi [33,. In addition, we consider alternative sampling-based methods 
(e.g., Gibbs sampling) for computing marginals for the extended MRFs. We also study properties 
of both message-passing and Gibbs sampling that are typical over a random ensemble of A:-SAT 
problems. We establish results that link the typical behavior of Gibbs sampling and message- 
passing algorithms under suitable initialization, and when applied to the extended family of MRFs 
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with p sufficiently close to one. 

The fact that the pure form of survey propagation (i.e., SP(1) in our notation) is a form of 
belief propagation was first conjectured by Braunstein et al. [S], and established independently of 
our work by Braunstein and Zecchina 0. In other independent work, Aurell et al. provided an 
alternative derivation of SP(1) that established a link to belief propagation. However, both of these 
papers treat only the case p = 1, and do not provide a combinatorial interpretation based on an 
underlying Markov random field. The results established here are a strict generalization, applying to 
the full range of p S [0, 1]. Moreover, the structures intrinsic to our Markov random fields — namely 
cores and lattices — highlight the importance of values p 7^ 1, and place the survey propagation 
algorithm on a combinatorial ground. As we discuss later, this combinatorial perspective has 
already inspired subsequent work [21 on survey propagation for satisfiability problems. Looking 
forward, the methodology of partial assignments may also open the door to other problems where a 
complicated landscape prevents local search algorithms from finding good solutions. As a concrete 
example, a subset of the current authors [HI have recently shown that related ideas can be leveraged 
to perform lossy data compression at near-optimal (Shannon limit) rates. 

1.2 Organization 

The remainder of this paper is organized in the following way: 

• In Section II. 3[ we provide further background on the fc-SAT problem, as well as previous 
work on survey propagation. 

• In Section 121 we introduce required notation and set up the problem more precisely. 

• In Section we define a family of Markov random fields (MRFs) over partial satisfiability 
assignments, and prove that survey propagation and related algorithms correspond to belief 
propagation on these MRFs. 

• Section 0] is devoted to analysis of the combinatorial properties of this family of extended 
MRFs, as well as some experimental results on cores and Gibbs sampling. 

• In Sectional we consider properties of random ensembles of SAT formulae, and prove results 
that link the performance of survey propagation and Gibbs sampling to the choice of Markov 
random field. 

• We conclude with a discussion in Section |S1 

We note that many of the results reported here have been presented (without proofs or details) as 
an extended SODA abstract j26j . 

1.3 Previous work on /c-SAT and survey propagation 

As a classical NP complete problem f^, the fc-SAT problem for k >2> has been extensively studied. 
One approach is to consider ensembles of random formulas; in particular, a commonly studied 
ensemble is based on choosing m = an clauses uniformly and with replacement from the set 
of all A;-clauses on n variables. Clearly, a formula drawn randomly from this ensemble becomes 
increasingly difficult to satisfy as the clause density a > increases. There is a large body of 
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(a) < a < Od (b) ad < a < Oc (c) Oc < a 



Figure 2. The black dots represent satisfying assignments, and white dots unsatisfying assignments. 
Distance is to be interpreted as the Hamming distance between assignments, (a) For low densities the 
space of satisfying assignments is well connected, (b) As the density increases above ad the space is 
believed to break up into an exponential number of clusters, each containing an exponential number 
of assignments. These clusters are separated by a "sea" of unsatisfying assignments, (c) Above ac 
all assignments become unsatisfying. 



work PSlI^nHHinHlElElin] devoted to the study of the threshold density where the formula becomes 
unsatisfiable; however, except for the case k = 2, the value of the threshold is currently unknown. 
However, non-rigorous techniques from statistical physics can be applied to yield estimates of 
the threshold; for instance, results from Mezard and Zecchina |HJ yield a threshold estimate of 
ac « 4.267 for /c = 3. 

The survey propagation (SP) algorithm, as introduced by Mezard, Parisi and Zecchina pB|, is 
an iterative message-passing technique that is able to find satisfying assignments for large instances 
of SAT problems at much higher densities than previous methods. The derivation of SP is based 
on the cavity method in conjunction with the 1-step replica summetry breaking (1-RSB) ansatz 
of statistical physics. We do not go into these ideas in depth here, but refer the reader to the 
physics literature [M)\ IH} 0H| for further details. In brief, the main assumption is the existence of a 
critical value ad for the density, smaller than the threshold density ac, at which the structure of 
the space of solutions of a random formula changes. For densities below ad the space of solutions 
is highly connected — in particular, it is possible to move from one solution to any other by single 
variable flips, ^ staying at all times in a satisfying assignment. For densities above ad, the space 
of solutions breaks up into clusters, so that moving from a SAT assignment within one cluster 
to some other assignment within another cluster requires flipping some constant fraction of the 
variables simultaneously. Figure |21 illustrates how the structure of the space of solutions evolves 
as the density of a random formula increases. The clustering phenomenon that is believed to 
occur in the second phase is known in the statistical physics literature as 1-step replica symmetry 
breaking j^U], and the estimated value for ad in the case k = 3 is ~ 3.921. Within each cluster, 
a distinction can be made between frozen variables — ones that do not change their value within 
the cluster — and free variables that do change their value in the cluster. A concise description of 
a cluster is an assignment of {0, 1, *} to the variables with the frozen variables taking their frozen 

^ There is no general agreement on whether assignments should be considered neighbors if they differ in only one 
variable, or any constant number of variables 
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value, and the free variables taking the joker or wild card value *. The original argument for the 
clustering assumption was the analysis of simpler satisfiability problems, such as XOR-SAT, where 
the existence of clusters can be demonstrated by rigorous methods [IHI- In addition, if one assumes 
that there are no clusters, the cavity method calculation yields a value for Oc > 5 (for k = 3), 
which is known to be wrong. More recently. Mora, Mezard and Zecchina |32j have demonstrated 
via rigorous methods that for A; > 8 and some clause density below the unsatisfiability threshold, 
clusters of solutions do indeed exist. 

The survey propagation (SP) algorithm is so-named, because like the belief propagation algo- 
rithm |341 145j ■ it entails propagating statistical information in the form of messages between nodes 
in the graph. In the original derivation of the updates |28( Ej, the messages are interpreted as 
"surveys" taken over the clusters in solution space, which provide information about the fraction 
of clusters in which a given variable is free or frozen. However, prior to the work presented here, 
it was not clear how to interpret the algorithm as an instantiation of belief propagation, and thus 
as a method for computing (approximations) to marginal distributions in a certain Markov ran- 
dom field (MRF). Moreover, as discussed above, our formulation of SP in this manner provides a 
broader view, in which SP is one of many possible message-passing algorithms that can be applied 
to smoothed MRF representations of SAT problems. 

2 Background and problem set-up 

In this section, we begin with notation and terminology necessary to describe the /c-SAT problem, 
and then provide a precise description of the survey propagation updates. 

2.1 The fc-SAT problem and factor graphs 

Basic notation: Let C and V represent index sets for the clauses and variables, respectively, 
where \V\ = n and \C\ = m. We denote elements of V using the letters k, etc., and members 
of C with the letters a,b,c, etc. We use xs to denote the subset of variables {xi : i £ S}. In the 
A;-SAT problem, the clause indexed by a G C is specified by the pair {V{a), Ja), where V{a) C V 
consists of k elements, and Ja := {Ja,i ■ i £ V{a)) is a /c-tuple of {0, l}-valued weights. The clause 
indexed by a is satisfied by the assignment x if and only if xv(a) / Ja- Equivalently, letting 5{y, z) 
denote an indicator function for the event {y = z}, if we define the function 

i^Jai^) := 1- n ^iJa,i.Xi), (1) 

then the clause a is satisfied by x if and only if tpj^{x) = 1. The overall formula consists of the 
AND of all the individual clauses, and is satisfied by x if and only if HaGC V' Ja(^) = ^■ 

Factor graphs: A convenient graphical representation of any /c-SAT problem is provided by the 
formalism of factor graphs (see |21] for further background). As illustrated in Figure |21 any instance 
of the fc-SAT problem can be associated with a particular bipartite graph on the variables (denoted 
by circular nodes) and clauses (denoted by square nodes), where the edge (a,i) between the clause 
a £ C and variable i £ V is included in E if and only if i G V{a). Following Braunstein et al. it 
is convenient to introduce two labellings of any given edge — namely, solid or dotted, corresponding 
to whether Ja^i is equal to or 1 respectively. 



6 



a 



b 




Figure 3. Factor graph representation of a 3-SAT problem on n = 5 variables with m = 4 clauses, 
in which circular and square nodes correspond to variables and clauses respectively. Solid and dotted 
edges (a, i) correspond to the weightings Ja,i = and Ja,i = 1 respectively. The clause a is defined 
by the neighborhood set V{a) — {1,2,3} and weights Ja = (0,1,1). In traditional notation, this 
corresponds to the formula {xi V a;2 V X3) A (xi V a;2 V X4) A {x2 V 0:3 V a;5) A {x2 V 0:4 V X5). 



For later use, we define (for each i G V) the set C{i) := {a G C : i G V{a)}, corresponding to 
those clauses that impose constraints on variable Xj. This set of clauses can be decomposed into 
two disjoint subsets 

C-{i) := {a G C{i) : Ja,^ = 1}, C+{i) := {a G C{i) : Ja,i = 0}, (2) 

according to whether the clause is satisfied by Xj = or Xj = 1 respectively. Moreover, for each pair 
{a,i) G E, the set C{i)\{a} can be divided into two (disjoint) subsets, depending on whether their 
preferred assignment of Xi agrees (in which case b G C^{i)) or disagrees (in which case b G C^{i)) 
with the preferred assignment of Xi corresponding to clause a. More formally, we define 

C'aii) ■■= {b G C{i)\{a} : Ja,i = JbA }, C,"(0 := {b G C{t)\{a} : Ja,^ + \i }• (3) 

Our focus is on random ensembles of /c-SAT instances: for a given clause density a > 0, a 
random instance is obtained by sampling m = an clauses uniformly and with replacement from 
the set of all fe-clauses on n variables. In terms of the factor graph representation, this procedure 
samples a random (n, m)-bipartite graph, in which each clause a G C has degree k. 

Markov random fields and marginalization: The /c-SAT problem can also be associated with 
a particular distribution defined as a Markov random field. Recall that a given instance of /c-SAT 
can be specified by the collection of clause functions {tpj^ : a G C}, as defined in equation (0). 
Using these functions, let us define a probability distribution over binary sequences via 

Pix) ■■= ^lli'Jaix), (4) 

aeC 

where Z := ^^.g^o 1}" Tlaec "^-Jai^) is the normalization constant Note that this definition makes 
sense if and only if the /c-SAT instance is satisfiable, in which case the distribution Q is simply 
the uniform distribution over satisfying assignments. 

This Markov random field representation (jlj of any satisfiable formula motivates a marginalization- 
based approach to finding a satisfying assignment. In particular, suppose that we had an oracle 
that could compute exactly the marginal probability 

P{xi) = ^ p{xi,X2,...,Xn), (5) 

{xj,j(^V\{i}} 



7 



for a particular variable Xi. Note that this marginal reveals the existence of SAT configurations 
with Xi = (if p{xi = 0) > 0) or Xj = 1 (if p{xi = 1) > 0). Therefore, a SAT configuration 
could be obtained by a recursive marginalization-decimation procedure, consisting of computing 
the marginal p{xi), appropriately setting Xi (i.e., decimating), and then re-iterating the modified 
Markov random field. 

Of course, exact marginalization is computationally intractable in general |lf)| I12j. which mo- 
tivates the use of efficient algorithms for approximate marginalization. An example of such an 
algorithm is what we will refer to as the "naive belief propagation algorithm" . The belief propa- 
gation (BP) algorithm, described in detail in Appendix^ can be applied to a MRF of the formic 
to estimate the marginal probabilities. Even though the BP algorithm is not exact, an intuitively 
reasonable approach is to set the variable that has the largest bias towards a particular value, and 
repeat. In fact, this marginalization-decimation approach based on naive BP finds a satisfying 
assignment for a up to approximately 3.9 for k = 3; for higher a, however, the iterations for BP 
typically fail to converge (2H1 El IHl • 



2.2 Survey propagation 

In contrast to the naive BP approach, a marginalization-decimation approach based on survey 
propagation appears to be effective in solving random A;-SAT problems even close to threshold [281 IB] . 
Here we provide an explicit description of what we refer to as the SP(/9) family of algorithms, where 
setting the parameter p = 1 yields the pure form of survey propagation. For any given p G [0, 1], the 
algorithm involves updating messages from clauses to variables, as well as from variables to clauses. 
Each clause a G C passes a real number r]a^i £ [0, 1] to each of its variable neighbors i G V{a). In 
the other direction, each variable i G V passes a triplet of real numbers Hi^a = (n^-^a' ^f.^^, n*_^^) 
to each of its clause neighbors a £ C{i). The precise form of the updates are given in Figure 0J 



Message from clause a to variable i: 



Va- 



n 

jeV{a)\{i} 



n« + + n* 



(6) 



Message from variable i to clause a: 



n* 



1-p II (l-%^i) 



n (1-%- 



1- n (i-^f--^) n (1-^''- 



(7a) 

(7b) 
(7c) 



Figure 4: SP(/9) updates 
We pause to make a few comments about these SP(/9) updates: 
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1. Although we have omitted the time step index for simphcity, equations © and ((T)) should be 
interpreted as defining a recursion on (??, H). The initial values for ij are chosen randomly in 
the interval (0, 1). 

2. The idea of the p parameter is to provide a smooth transition from the original naive belief 
propagation algorithm to the survey propagation algorithm. As shown in fi^, setting p = 
yields the belief propagation updates applied to the probability distribution Q), whereas 
setting p = 1 yields the pure version of survey propagation. 

2.2.1 Intuitive "warning" interpretation 

To gain intuition for these updates, it is helpful to consider the pure SP setting of p = 1. As 
described by Braunstein et al. [HI, the messages in this case have a natural interpretation in terms 
of probabilities of warnings. In particular, at time t = 0, suppose that the clause a sends a warning 
message to variable i with probability iJa^i^ ^^'^ ^ message without a warning with probability 
1 — Va^i- After receiving all messages from clauses in C{i)\{a}, variable i sends a particular 
symbol to clause a saying either that it can't satisfy it ("u"), that it can satisfy it ("s"), or that it 
is indifferent ("*"), depending on what messages it got from its other clauses. There are four cases: 

1. If variable i receives warnings from C^{i) and no warnings from C^{i), then it cannot satisfy 
a and sends "u" . 

2. If variable i receives warnings from C^{i) but no warnings from C^{i), then it sends an "s" 
to indicate that it is inclined to satisfy the clause a. 

3. If variable i receives no warnings from either C^{i) or C^{i), then it is indifferent and sends 

* . 

4. If variable i receives warnings from both C^{i) and C^{i), a contradiction has occurred. 

The updates from clauses to variables are especially simple: in particular, any given clause sends a 
warning if and only if it receives "u" symbols from all of its other variables. 

In this context, the real- valued messages involved in the pure SP(1) all have natural probabilistic 
interpretations. In particular, the message r]a^i corresponds to the probability that clause a sends 
a warning to variable i. The quantity II!f_^^ can be interpreted as the probability that variable j 
sends the "u" symbol to clause a, and similarly for IIj_^^ and II^__^^. The normalization by the 
sum n"_^^ + ^j-ta + ^j^a reflects the fact that the fourth case is a failure, and hence is excluded 
a priori from the probability distribution 

Suppose that all of the possible warning events were independent. In this case, the SP message 
update equations © and would be the correct estimates for the probabilities. This independence 
assumption is valid on a graph without cycles, and in that case the SP updates do have a rigorous 
probabilistic interpretation. It is not clear if the equations have a simple interpretation in the case 
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2.2.2 Decimation based on survey propagation 

Supposing that these survey propagation updates are apphed and converge, the overah conviction 
of a value at a given variable is computed from the incoming set of equilibrium messages as 

oc 1 
/ij(0) oc 1 

f^ii*) oc n n 

beC+O) bec-ij) 

To be consistent with their interpretation as (approximate) marginals, the triplet {/ij(0), /ij(l)} 
at each node i G ^ is normalized to sum to one. We define the bias of a variable node as 
B{i) := 1^,(0) 

The marginalization-decimation algorithm based on survey propagation ^ consists of the fol- 
lowing steps: 

1. Run SP(1) on the SAT problem. Extract the fraction (3 of variables with the largest biases, 
and set them to their preferred values. 

2. Simplify the SAT formula, and return to Step 1. 

Once the maximum bias over all variables falls below a pre-specified tolerance, the Walk-SAT 
algorithm is applied to the formula to find the remainder of the assignment (if possible). Intuitively, 
the goal of the initial phases of decimation is to find a cluster; once inside the cluster, the induced 
problem is considered easy to solve, meaning that any "local" algorithm should perform well within 
a given cluster. 

3 Markov random fields over partial assignments 

In this section, we show how a large class of message-passing algorithms — including the SP(/9) family 
as a particular case — can be recovered by applying the well-known belief propagation algorithm to 
a novel class of Markov random fields (MRFs) associated with any A;-SAT problem. We begin by 
introducing the notion of a partial assignment, and then use it to define the family of MRFs over 
these assignments. 

3.1 Partial assignments 

Suppose that the variables x = {xi, . . . , Xn} are allowed to take values in {0, 1, *}, which we refer 
to as a partial assignment. It will be convenient, when discussing the assignment of a variable Xi 
with respect to a particular clause a, to use the notation Sa,i := 1 — Ja,i and Ua^i := Ja,i to indicate, 
respectively, the values that are satisfying and unsatisfying for the clause a. With this set-up, we 
have the following: 



P n 

b&C+{j) 

p n ^-"^b^j 

beC'(j) 



- bGC-(j) 

- bec+{j) 
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Definition 1. A partial assignment x is invalid for a clause a if either 

(a) all variables are unsatisfying (i.e., Xi = Ua,i for all i G V{a)), or 

(b) all variables are unsatisfying except for exactly one index j € V{a), for which Xj = *. 

Otherwise, the partial assignment x is valid for clause a, and we denote this event by YALa{xv(a))- 
We say that a partial assignment is valid for a formula if it is valid for all of its clauses. 

The motivation for deeming case (a) invalid is clear, in that any partial assignment that does 
not satisfy the clause must be excluded. Note that case (b) is also invalid, since (with all other 
variables unsatisfying) the variable xj is effectively forced to Sa,i, and so cannot be assigned the * 
symbol. 

For a valid partial assignment, the subset of variables that are assigned either or 1 values can 
be divided into constrained and unconstrained variables in the following way: 

Definition 2. We say that a variable Xi is the unique satisfying variable for a clause if it is 
assigned Sa,i whereas all other variables in the clause (i.e., the variables {xj : j G y{o-)\{i}}) are 
assigned Uaj . A variable xi is constrained by clause a if it is the unique satisfying variable. 

We let C0Nj^a(3;y(o)) denote an indicator function for the event that Xi is the unique satisfying 
variable in the partial assignment Xv[a) clause a. A variable is unconstrained if it has or 1 
value, and is not constrained. Thus for any partial assignment the variables are divided into stars, 
constrained and unconstrained variables. We define the three sets 

S^{x) := {i : Xi = Sc{x) := {i e V : Xi constrained} So{x) := {i e V : Xi unconstrained} 

(8) 

of *, constrained and unconstrained variables respectively. Finally, we use n*(x), ndx) and no{x) 
to denote the respective sizes of these three sets. 

Various probability distributions can be defined on valid partial assignments by giving different 
weights to stars, constrained and unconstrained variables, which we denote by cUc, and uJq 
respectively. Since only the ratio of the weights matters, we set Wc = 1, and treat tUo and as 
free non- negative parameters (we generally take them in the interval [0, 1]). We define the weights 
of partial assignments in the following way: invalid assignments x have weight W{x) = 0, and for 
any valid assignment x, we set 

W{x) := (a;o)"°(^) x (a;*)"*^^). (9) 

Our primary interest is the probability distribution given by pw{x) oc W{x). In contrast to the 
earlier distribution p, it is important to observe that this definition is valid for any SAT problem, 
whether or not it is satisfiable, as long as lu^.. ^ 0, since the all-* vector is always a valid partial 
assignment. Note that if uJq = I and uj^, = then the distribution pwi^) is the uniform distribution 
on satisfying assignments. Another interesting case that we will discuss is that of = and a;* = 1, 
which corresponds to the uniform distribution over valid partial assignments without unconstrained 
variables. 
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3.2 Associated Mcirkov random fields 

Given our set-up thus far, it is not at all obvious whether or not the distribution pw can be 
decomposed as a Markov random field based on the original factor graph. Interestingly, we find 
that pw does indeed have such a Markov representation for any choices of lOq, £ [0, !]• Obtaining 
this representation requires the addition of another dimension to our representation, which allows 
us to assess whether a given variable is constrained or unconstrained. We define the parent set of 
a given variable Xi, denoted by Pi, to be the set of clauses for which Xi is the unique satisfying 
variable. Immediate consequences of this definition are the following: 

(a) If Xi = 0, then we must have Pj C C~{i). 

(b) If Xj = 1, then there must hold Pi C C+(z). 

(c) The setting Xi = * implies that Pj = 0. 

Note also that Pj = means that Xi cannot be constrained. For each i e V, let V{i) be the 
set of all possible parent sets of variable i. Due to the restrictions imposed by our definition, Pj 

must be contained in either C~^{i) or C~{i) but not both. Therefore, the cardinality^ of V{i) is 

2\c-{i)\ ^2\^+ii)\ _ I 

Our extended Markov random field is defined on the Cartesian product space Xi x . . . x X^, 
where Xi := {0, 1, *} x P(i). The distribution factorizes as a product of compatibility functions at 
the variable and clause nodes of the factor graph, which are defined as follows: 

Variable compatibilities: Each variable node i e V has an associated compatibility function 
of the form: 

{Wo : Pi = $,Xi^* 
uj^ : Pi = {b,Xi = * (10) 
1 : for any other valid (Pi,Xi) 

The role of these functions is to assign weight to the partial assignments according to the number 
of unconstrained and star variables, as in the weighted distribution pw- 

Clause compatibilities: The compatibility functions at the clause nodes serve to ensure that 
only valid assignments have non-zero probability, and that the parent sets Pv{a) ■= {Pi '■ ^ ^ ^('^)} 

are consistent with the assignments xy^^j-j := {xj : i G V{a)} in the neighborhood of a. More 
precisely, we require that the partial assignment xyf^a) is valid for a (i.e., \KLa{xv{a)) = 1) 
that for each i e V{a), exactly one of the two following conditions holds: 

(a) a £ Pi and Xi is constrained by a or 

(b) a ^ Pi and Xj is not constrained by a. 

The following compatibility function corresponds to an indicator function for the intersection 
of these events: 

^a{xv{a),Pv{a)) ■=y^K{xv{a))x JJ <5(lnd[a G P^] , CON^,^ (xy(„) )) . (11) 

igy(a) 

^Note that it is necessary to subtract one so as not to count the empty set twice. 
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We now form a Markov random field over partial assignments and parent sets by taking the product 
of variable and clause compatibility functions 



Pgen{x,P) OC J|^'i(Xi,Pi) a{xVa, Pv{a)) ■ 



(12) 



i&V adC 



With these definitions, some straightforward calculations show that Pgen = Pw- 
3.3 Survey propagation as an instance of belief propagation 

We now consider the form of the belief propagation (BP) updates as applied to the MRF pg^n 
defined by equation H12|). We refer the reader to Section for the definition of the BP algorithm 
on a general factor graph. The main result of this section is to establish that the SP(p) family 
of algorithms are equivalent to belief propagation as applied to pgen with suitable choices of the 
weights LVo and uj^,. In the interests of readability, most of the technical lemmas will be presented 
in the appendix. 

We begin by introducing some notation necessary to describe the BP updates on the extended 
MRF. The BP message from clause a to variable i, denoted by Ma^i{-), is a vector of length 
= 3 X |'P(«)|. Fortunately, due to symmetries in the variable and clause compatibilities defined 
in equations H1U|) and Hll|). it turns out that the clause-to- variable message can be parameterized 
by only three numbers, {M^_^^, M^^^, M*_^^}, as follows: 



where M^^^,M^_^^ and M*^- are elements of [0, 1]. 

Now turning to messages from variables to clauses, it is convenient to introduce the notation 
Pi = S U {a} as a shorthand for the event 



where it is understood that S could be empty. In Appendix El we show that the variable-to-clause 
message Mi^a is fully specified by values for pairs (xj. Pi) of six general types: 



{{Sa,^,SU{a}), {Sa,i,%^Pi<^Ci{i)), {Ua,i,(ll ^ Pi Q C^Hi)), (Sa,„0), (^Xa,i,0), (*,0)}. 



The BP updates themselves are most compactly expressed in terms of particular linear combinations 
of such basic messages, defined in the following way: 



Ma^i{xi,Pi) 



< 




Sa,i, Pi = SU {a} for some S C Q(i) 

Ua,i, Pi ^ C'^(i), 

Sa,i, PiQC'^ii) OI Xi = * ,Pi= Hi, 



(13) 



aePi and S = Pi\{a} C C'^{i) 




(14a) 





(14b) 




(14c) 



P^QCiii) 
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Note that is associated with the event that Xi is the unique satisfying variable for clause 

ci; -^^-^a with the event that Xi does not satisfy a; and R*^a with the event that Xi is neither 
unsatisfying nor uniquely satisfying (i.e., either Xi = *, or Xi = Sa,i but is not the only variable 
that satisfies o). 

With this terminology, the BP algorithm on the extended MRF can be expressed in terms of 
the following recursions on the triplets (M^^j, M^_^j, M*^j) and (Rf^a^ ^i^a^ ^i^a)'- 



BP updates on extended MRF: 
Messages from clause a to variable i 



jeC(a)\{i} 



ieC(a)\{i} keC{a)\{i} jeC{a)\{i,k} jeCia)\{i} 

jeC{a)\{i} jeC{a)\{i} 



M. 



Messages from variable i to clause a: 
R 



.^a = n ^'4-4 n {M^^^+MU) 



R. 



n ^'^b^\ n {Mu+Mt^i)~{i~^o) n ^-^6- 

bec^{t) 



- n n (^^' 

bec^ii) bec^ii) 



n ^b- 

b£C^{i}UC^(i) 



We provide a detailed derivation of these BP equations on the extended MRF in Appendix El 
Since the messages are interpreted as probabilities, we only need their ratio, and we can normalize 
them to any constant. At any iteration, approximations to the local marginals at each variable 
node i gV are given by (up to a normalization constant): 



Ml 



bec+{i) bec-{i) feec-(i) 

F,{1) oc n M^,,[ n (M,^_, + Af*^,)-(l-a;,) J] K-.. 
bec-(i) bec+(j) bGC+(i) 

The following theorem establishes that the SP(p) family of algorithms is equivalent to belief 
propagation on the extended MRF: 

Theorem 3. For all ui^ £ [0)1]; BP updates on the extended {lo^,uJo)-MRF pg^n are equivalent 
to the SP(ti;*) family of algorithms under the following restrictions: 



(a) the constraint + uj^ = 1 is imposed, and 
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(b) all messages are initialized such that M"_^- = M*^- for every edge {a,i). 

Proof. Under the constraint + = 1, if we initialize M^^- = M*^- on every edge, then there 
holds Rf^n = R*^a ^"^^ consequently M^^- = M*^- remains true at the next iteration. Initializing 
the parameters in this way and imposing the normalization M^^- + M*^ • = 1 leads to the following 
recurrence equations: 



where: 



UjGCia)\{i} R'j^a 



n 



jeC(a)\\i}\^j 



n (1-^^%)- 

becsii) 



These updates are equivalent to SP(ti;*) by setting rja- 



□ 



Remarks: 

1 . Theorem |21 is a generalization of the result of Braunstein and Zecchina |7j , who showed that 
SP(1) is equivalent to belief propagation on a certain MRF. 

2. The essence of Theorem |21 is that the pure survey propagation algorithm, as well as all the 
p- variants thereof, are all equivalent to belief propagation on our extended MRF with suitable 
parameter choices. This equivalence is important for a number of reasons: 

(a) Belief propagation is a widely-used algorithm for computing approximations to marginal 
distributions in general Markov random fields •^_„ .24 . It also has a variational interpre- 
tation as an iterative method for attempting to solve a non-convex optimization problem 
based on the Bethe approximation [ISj. Among other consequences, this variational in- 
terpretation leads to other algorithms that also solve the Bethe problem, but unlike 
belief propagation, are guaranteed to converge [151 HSl HO] . 

(b) Given the link between SP and extended MRFs, it is natural to study combinatorial and 
probabilistic properties of the latter objects. In Section [IJ we show how so-called "cores" 
arise as fixed points of SP(1), and we prove a weight-preserving identity that shows how 
the extended MRF for general /? is a "smoothed" version of the naive MRF. 

(c) Finally, since BP (and hence SP) is computing approximate marginals for the MRF, it 
is natural to study other ways of computing marginals and examine if these lead to an 
effective way for solving random A:-SAT problems. We begin this study in Section 14.51 

3. The initial messages have very small influence on the behavior of the algorithm, and they are 
typically chosen to be uniform random variables in (0, 1). In practice, for -|- = 1 if we 
start with different values for M'^_^- and M*_^. they soon converge to become equal. 
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4. If we restrict our attention to 3-SAT, tire equations have simpler form. In particular for a 
clause the messages to variable node i are: 

-\/ru _ T>* E>* _|_ T>s r>u _|_ r>u jjs 
-"^a^i ^j-^a^k^a ' ■'^j^a-'^k^a ' ■'^j^a-'^k^a 

^'■'■a-^i ■'^j-^a-'^k—^a ' ^j-^a^k-^a ' ^j-^a^k—*a- 



4 Combinatorial properties 

This section is devoted to an investigation of the combinatorial properties associated with the family 

of extended Markov random fields defined in the previous section. We begin by defining an acyclic 
directed graph on all valid partial assignments. Of particular interest are the minimal elements in 
the resulting partial ordering. We refer to these as cores. 

4.1 Directed graph and peirtial ordering 

The vertex set of the directed graph G consists of all valid partial assignments. The edge set 
is defined in the following way: for a given pair of valid partial assignments x and y, the graph 
includes a directed edge from x to y if there exists an index i £V such that (i) Xj = yj for all j ^ i; 
and (ii) yi = * and Xi 7^ y^. We label the edge between x and y with the index i, corresponding to 
the fact that y is obtained from x by adding one extra * in position i. 
This directed graph G has a number of properties: 

(a) Valid partial assignments can be separated into different levels based on their number of 
star variables. In particular, assignment x is in level n*(x). Thus, every edge is from an 
assignment in level Z — 1 to one in level I, where I is at most n. 

(b) The out-degree of any valid partial assignment x is exactly equal to its number of uncon- 
strained variables no{x). 

(c) It is an acyclic graph so that its structure defines a partial ordering; in particular, we write 

y < X if there is a directed path in G from x to y. Notice that all directed paths from x to y 
arc labeled by indices in the set T = {i G F : Xj / = *}, and only the order in which they 
appear is different. 

Given the partial ordering defined by G, it is natural to consider elements that are minimal 
in this partial ordering. For any valid partial assignment x and a subset S CV, let 7s'(x) be the 
minimal y < x, such that the path from x to y is labeled only by indices in S. In particular 7y(x) 
is a minimal assignment in the order. It is easy to show that there always exists a unique 75 (x). 

Proposition 4. For any valid assignment x and S C.V, there is a unique minimal y < x such that 
the path from x to y is labeled only by indices in S. Furthermore So{y)r\S = and <S'*(y) = S'*(x)UT, 
where T C. S is the set of labels on any path from x to y. 
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Proof. To prove the second assertion in the proposition statement for a minimal y, suppose there 
exists i G S (1 So{y)- Then there must be be an outgoing edge from y labeled by an element in S, 
which contradicts the assumed minimality of y. The equivalence S^{y) = S^{x) UT follows directly 
from the definition of G and its edge labels. 

To establish the uniqueness statement, suppose that there are two minimal such assignments yi 
and y2, and the paths from x to yi and 2/2 are labeled by sets of indices Ti,T2 ^ S respectively. If 
Ti = T2 then yi = y2, so let us assume that Ti and T2 are distinct. Without loss of generality, we 
may take Ti\T2 7^ 0. Consider a particular path from xtoyi, with labels ti,t2, ■ ■ - tr, where r = |Ti|. 
Let ti be the first label such that ti ^ T2. Then its corresponding variable is unconstrained when the 
variables indexed by {ti, . . . ti-i} U S^,{x) C T2 U ^^.(x) are assigned *, therefore it is unconstrained 
in ?/2- This implies that there exists an edge out of 2/2 that is labeled by ti G S, which contradicts 
the assumption that y2 is minimal. □ 

We define a core assignment to be a valid partial assignment y £ {0, 1, *}" that contains no 
unconstrained variables. We say that a core assignment y is non-trivial if n^[y) < n, so that it 
has at least one constrained {0, 1} variable. Under this definition, it follows that for any partial 
assignment x, the associated minimal element 7y(a;) is a core assignment. 

Given a valid ordinary assignment z £ {0, l}*^, an interesting object is the subgraph of partial 
assignments that lie below it in the partial ordering. It can be seen that any pair of elements in 
this subgraph have both a unique maximal element and a unique minimal element, so that any 
such subgraph is a lattice 

In examples shown in Figure 13 only a subset of the partial assignments is shown, since even for 
small formulas the space of partial assignments is quite large. For the first formula all satisfying 
assignments have a trivial core. For the second one, on the other hand, there are assignments with 
non-trivial cores. 

4.2 Pure survey propagation as a peeling procedure 

As a particular case of Theorem |31 setting = 1 and = yields the extended MRF that 
underlies the SP(1) algorithm. In this case, the only valid assignments with positive weight are 
those without any unconstrained variables — namely, core assignments. Thus, the distribution pw 
for (wo,^*) = (0,1) is simply uniform over the core assignments. The following result connects 
fixed points of SP(1) to core assignments: 

Proposition 5. For a valid assignment x, let SP(1) be initialized by: 

Then within a finite number of steps, the algorithm converges and the output fields are 

l^iip) = S{yi,b), 

where y = 71/(2;) and b G {0, 1, *}. 

Proof. We say that a variable i belongs to the core if 7^ *. We say that a clause a belongs to the 
core if all the variables in the clause belong to the core. We first show by induction that 

I. If a and i belong to the core and yi is not the unique satisfying variable for a then nr_^^ = 
S{xi,Ua,i) and n|_^^ = 6{xi,Sa,i), and 
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Figure 5. Portion of the directed graph on partial assignments for two different formulas: (a) 

(xi Vx2 Vxa) A (x2 Vx3 VX4). highlighted is the lattice below the satisfying assignment z=: (1,1, 1,1), 
whose core is trivial (i.e., 7y(z) = (*, *, *, *)). (b) {xi V X2 V X3) A {xi V X2 V X3) A (x2 V V xi) A 
{x2 V X3 V X5) A {xi V X5 V X4). the satisfying assignment z = (0,0,0,0, 1) has the non-trivial core 
7y (z) = (0, 0, 0, *, *). For the same formula there are other satisfying assignments, e.g. (1, 0, 1, 0, 1) 
which have a trivial core. 
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II. If a and i belong to the core and yi is the unique satisfying variable for a then ??a-^i = 1- 

Clearly, property I holds at time 0. Therefore, it suffices to prove that if property I holds at time 
t then so does II. and that if property II holds at time t then property I holds at time t + 1. 

Suppose that property I holds at time t. Let a and i belong to the core such that yi is the 
unique satisfying variable of the clause a. By the induction hypothesis for all j G V{a) \ {i} it holds 
that n"_^^ = 6{xj,Uaj) = 1. This implies that rja^i = 1 as needed. 

Suppose that property II holds at time t. Let a and i belong to the core such that yi is not 
unique satisfying for a. By the assumption, it follows that there exists b which belongs to the core 
such that yi is the unique satisfying variable for b. This implies by the induction hypothesis that 
r]b^i = 1. It is now easy to see that at update t + 1: n"_^^ = 6{xi,Ua,i) and n|_^^ = 5{xi,Sa,i)- 
Note that the claim above implies that for all times t and all i such that yi ^ *, it holds that 

Let ii,i2, ■ ■ ■ ,is be a "peeling-path" from x to y. In other words, the variable ii is not uniquely 
satisfying any clause. Once, this variable is set to *, the variable 12 is not uniquely satisfying any 
clause etc. We claim that for all 1 < t < s, for all updates after time t and for all clauses a such 
that if € V{a) it holds that r/a^it = 0. The proof follows easily by induction on t. This in turn 
implies that if for all updates after time t ^i^{b) = 5{yi, *), from which the result follows. □ 

Thus, SP(1), when suitably initialized, simply strips the valid assignment x down to its core 
7v(x). Moreover, Proposition [3 in conjunction with Theorem |31 leads to viewing the pure form 
of survey propagation SP(1) as performing an approximate marginalization over cores. Therefore, 
our results raise the crucial question: do cores exist for random formulas? Motivated by this 
perspective, Achlioptas and Ricci-Tersenghi 2\ has answered this question affirmatively for fc-SAT 
with k > 9. In Section [51 we show that cores, if they exist, must be "large" in a suitable sense 
(see Proposition IHl). In the following section, we explore the case A; = 3 via experiments on large 
instances. 

4.3 Peeling experiments 

We have performed a large number of the following experiments: 

1. starting with a satisfying assignment x, change a random one of its unconstrained variables 
to *, 

2. repeat until there are no unconstrained variables. 

This procedure, which we refer to as "peeling", is equivalent to taking a random path from x in 
G, by choosing at each step a random outgoing edge. Any such path terminates at the core 7y(x). 
It is interesting to examine at each step of this process the number of unconstrained variables 
(equivalently, the number of outgoing edges in the graph G). For k = 3 SAT problems. Figure El 
shows the results of such experiments for n = 100, 000, and using different values of a. The plotted 
curves are the evolution of the number of unconstrained variables as the number of *'s increases. 
We note that for n = 100 and a close to the threshold, satisfying assignments often correspond to 
core assignments; a similar observation was also made by Braunstein and Zecchina [3- In contrast, 
for larger n, this correspondence is rarely the case. Rather, the generated curves suggest that jvix) 
is almost always the all-* assignment, and moreover that for high density a, there is a critical level 
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Figure 6. Evolution of the number of unconstrained variables in the peeling process: start with a 
satisfying assignment, change a random unconstrained variable to * and repeat. Plotted is the result 
of an experiment for n — 100, 000, for random formulas with fc = 3 and a — {2, 2.5, 3, 3.5, 4, 4.1, 4.2}. 
In particular, core assignments are on the z-axis, and satisfying assignments are on the j/-axis. 
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in G where the out-degrees are very low. Increasing a results in failure of the algorithm itself, 
rather than in the formation of real core assignments. 

For k = 2, the event that there is a path in G from a satisfying assignment to the all-* 
assignment has a very natural interpretation. In particular, it is equivalent to the event that 
the pure-literal rule succeeds in finding an assignment. The pure-literal rule [HEl is an algorithm 
consisting of the following steps: assign 1 to a variable if it only appears positively in a clause, 
and if it only appears negatively in a clause, reduce the formula, and repeat the procedure. It 
is straightforward to check that the sequence of variables given by the labels on any path from 
the all-* assignment to a satisfying assignment can be identified with a sequence of steps of the 
pure-literal type. Furthermore, it is known 36 that there is a phase transition for the event that 
the pure- literal rule succeeds at a = 1. 

Interestingly, as mentioned earlier, for A; > 9 there are values for a < Oc such that this peeling 
procedure provably results in a non-trivial core assignment with high probability, according to j^. 
The fact that we do not observe core assignments for k = 3, and yet the algorithm is successful, 
means that an alternative explanation is required. Accordingly, we propose studying the behavior of 
SP(/3) for p £ (0, 1). Our experimental results, consistent with similar reports from Kirkpatrick |23j . 
show that SP(p) tends to be most effective in solving A;-SAT for values of p < 1. If so, the good 
behavior of SP(1) may well follow from the similarity of SP(1) updates to SP(/9) updates for /) ~ 1. 
To further explore this issue, the effects of varying the weight distribution (cjo, ^^*), and consequently 
the parameter p, are discussed in the following section. 

4.4 Weight distribution and smoothing 

One of the benefits of our analysis is that it suggests a large pool of algorithms to be investigated. 
One option is to vary the values of lOq and w*. A "good" setting of these parameters should 
place significant weight on precisely those valid assignments that can be extended to satisfying 
assignments. At the same time, the parameter setting clearly affects the level of connectivity in the 
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Figure 7. Performance of BP for different choices of {u!o,uj^,) as applied to a particular randomly 
chosen formula with n = 10000, k = 3, a — 4.2. Four distinct cases can be distinguished: (i) BP 
converges and the decimation steps yields a complete solution, (ii) BP converges and the decimation 
steps yield a partial solution, completed by using Walk-SAT, (iii) BP converges, but the decimation 
steps don't lead to a solution, and (iv) BP does not converge. 



space of valid assignments. Connectivity most likely affects the performance of belief propagation, 
as well as any other algorithm that we may apply to compute marginals or sample from the 
distribution. 

Figure Efa) shows the performance of belief propagation on the extended MRF for different 
values of (uJo,uj^), and applied to particular random formula with n = 10,000, k = 3 and a = 4.2. 
The most successful pairs in this case were (0.05,0.95), (0.05,0.9), (0.05,0.85), and (0.05,0.8). For 
these settings of the parameters the decimation steps reached a solution, so a call to WalkSAT 
was not needed. For weights satisfying + > 1, the behavior is very predictable: although 
the algorithm converges, the choices that it makes in the decimation steps lead to a contradiction. 
Note that there is a sharp transition in algorithm behavior as the weights cross the line Wo + w^, = 1, 
which is representative of the more general behavior. 

The following result provides some justification for the excellent performance in the regime 

UJo + UJ* < 1. 

Theorem 6. If uJo + t^* = 1, then X^j^<^ W{y) = w"*^^^ for any valid assignment x. IfLOo + ^* < 1; 
then Yly<x ^(y) — i^*)""^^^ for any valid assignment x. 

It should be noted that Theorem El has a very natural interpretation in terms of a "smoothing" 
operation. In particular, the (wq; '-i^*)-MRF may be regarded as a smoothed version of the uniform 
distribution over satisfying assignments, in which the uniform weight assigned to each satisfying 
assignment is spread over the lattice associated with it.^ 

The remainder of this section is devoted to the proof of Theorem El 

Proof. We start with the case lOq + i^* = 1. Let A denote the set of partial assignments z such 
that Zj £ {xj,*} for all j £ V. We refer to these as the set of assignments consistent with x. Let 

^Note, however, that any partial assignment that belongs to two or more lattices is assigned a weight only once. 
Otherwise, the transformation would be a convolution operation in a strict sense. 
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Figure 8. The directed graph G and the map a for the formula {xi V a;2 V 2:3) A (2^2 V 2^3 V 2:4) and the 
satisfying assignment (0, 0, 1, 0). The sohd arrows denote edges in G and the dashed arrows denote 
a. 



B = {y : y < x} he the set of valid assignments that are reachable from x. Notice that all y € B 
are valid and consistent with x, but not every valid assignment in A is reachable from x. We will 
let S^{z) denote the set of variables assigned * both for valid and invalid assignments z. 

We define a map between all assignments consistent with x and the set of reachable ones. Let 
a : A ^ B he defined as 

Notice that if y G i? then cr{y) = y. The map is, of course, many-to-one. We define what we'll 
show is the reverse map. For y £ B let 

T{y) ■.= {zeA: S,{z) = SM UT,T C 

Lemma 7. For any y £ B and z A, z £ T{y) if and only if a{z) = y. 

Proof. Let z G T{y) so that S^{z) = S^{y) U T for some T C Sc{y)- o-{z) = 7s,(z)(a:) is the minimal 
valid assignment such that the path from x to it is labeled only by elements in St:{z). We'll show 
that y satisfies these properties, and therefore by proposition^ y = a{z). Any path from x to y 
(which exists since y £ B) is labeled by S^{y)\S^{x) C S^{z). Furthermore, for every i £ S^^^z), 
i ^ So{y) so there is no outgoing edge from y labeled by an element in S^{z). Therefore y is 
minimal. 

Let y = a{z) = 75, (^) (2;). By proposition there is no i G S^:{z) such that i £ So{y)- Therefore 
S*{z) C S^{y) U Sc{y)- Further we have that ^^(y) Q S^{z) U ^^(x) = S^{z), therefore S^{z) = 
U T for some T C Sc{y)- Hence z £ T{y). □ 

For a set of partial assignments X let W{X) = Z]xex^(^)- W^{z) = (w*)"**^^) x 

(tJo)""''*^^^ denote the weight of any partial assignment, if the formula had no clauses. For such 
a formula all partial assignments are valid. Observe that if we restrict our attention to the assign- 



22 



ments that are consistent with x, 

SCV\S4x) 

We show that when clauses are added to the formula, the total weight under x is preserved 
as long as x is still valid. In particular when an assignment z that is consistent with x becomes 
invalid, it passes its weight to an assignment that is still valid, namely o"(z), which has fewer * 
variables than z. 

w{y) = {uj,y'*^y'> X {ujoT-^y'^ X r^^s') 

= X {iVoT^^y^ X (u;, +c^„)"=(f) (17) 

= W\z:S.{z) = SM'^T) 

TCSciy) 

= W\{z:S,{z) = S,{y)UT,TQS,{y)}) 
= ^^T(y)). 



Finally, we have: 



y<x y^x 

where we used the fact that the sets T(y) for y ^ B partition A by lemma [7| 

The proof of the case + cj^, < 1 is similar except that equation (fTTj) becomes an inequality: 

w{y) = X {uj,T"^y^ X r^^s') > E 

rc5c(s) 

When an assignment z that is consistent with x becomes invalid, it passes more than its own weight 
to a{z). □ 



4.5 Gibbs sampling 

Based on our experiments, the algorithm SP(p) is very effective for appropriate choices of the pa- 
rameter p. The link provided by Theorem |21 suggests that the distribution py/, for which SP(/9) — as 
an instantiation of belief propagation on the extended MRF — is computing approximate marginals, 
must posses good "smoothness" properties. One expected consequence of such "smoothness" is that 
algorithms other than BP should also be effective in computing approximate marginals. Interest- 
ingly, rigorous conditions that imply (rapid) convergence of BP [HH] — namely, uniqueness of Gibbs 
measures on the computation tree — are quite similar to conditions implying rapid convergence of 
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Gibbs samplers, which are often expressed in terms of "uniqueness", "strong spatial mixing", and 
"extremality" (see, for example |27( Hj). 

In this section, we explore the application of sampling methods to the extended MRF as a means 
of computing unbiased stochastic approximations to the marginal distributions, and hence biases at 
each variable. More specifically, we implemented a Gibbs sampler for the family of extended MRFs 



SAT a 


Gibbs p 




SAT a 


Gibbs p 




0.4 


0.5 


0.7 


0.9 






0.4 


0.5 


0.7 


0.9 


4.2 


0.0493 


0.1401 


0.3143 


0.4255 




4.2 


0.0440 


0.1462 


0.3166 


0.4304 


4.1 


0.0297 


0.1142 


0.3015 


0.4046 




4.1 


0.0632 


0.0373 


0.2896 


0.4119 


4.0 


0.0874 


0.0416 


0.2765 


0.3873 




4.0 


0.0404 


0.0666 


0.2755 


0.3984 


3.8 


0.4230 


0.4554 


0.1767 


0.0737 




3.8 


0.1073 


0.0651 


0.2172 


0.3576 


3.6 


0.4032 


0.4149 


0.1993 


0.0582 




3.6 


0.1014 


0.0922 


0.1620 


0.3087 


3.4 


0.4090 


0.4010 


0.2234 


0.0821 




3.4 


0.3716 


0.3629 


0.1948 


0.0220 
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(c) Comparison to SP(0.7) (d) Comparison to SP(0.5) 

Figure 9. Comparison of SP{/3) pseudomarginals for f3 E {0.95, 0.9, 0.7, 0.5} to marginals estimated 
by Gibbs sampling on weighted MRFs with p G {0.4,0.5,0.7,0.9} for the range of SAT problems 
a G {4.2, 4.1, 4.0.3.8, 3.6, 3.4}. Each entry in each table shows the average £i error between the biases 
computed from the SP{(3) pseudomarginals compared to the biases computed from Gibbs sampling 
applied to MRF{p). Calculations were based on top 50 most biased nodes on a problem of size 
n — 1000. The bold entry within each row (corresponding to a fixed a) indicates the MRF{p) that 
yields the smallest error in comparison to the SP biases. 



developed in Sectional The Gibbs sampler performs a random walk over the configuration space 
of the extended MRF — that is, on the space of partial valid assignments. Each step of the random 
walk entails picking a variable Xi uniformly at random, and updating it randomly to a new value 
h G {0, 1, *} according to the conditional probability pw{xi = h\{xj : j ^ i)). By the construction 
of our extended MRF (see equation (fT2|) ). this conditional probability is an (explicit) function of 
the variables Xj and Xi appear together in a clause, and of the variables xt such that Xk and Xj 
appear together in a clause, where Xj and xi appear together in a clause. 

It is of interest to compare the approximate marginals computed by the SP(/?) family of algo- 
rithms (to which we refer as pseudomarginals) to the (stochastic) estimates computed by Gibbs 
sampler. Given the manner in which the SP pseudomarginals are used in the decimation pro- 
cedure, the most natural comparison is between the biases /ij(0) — provided by the SP{f3) 
algorithm, and the biases Ti(0) — rj(l) associated with the Gibbs sampler (where are the approx- 
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imate marginals obtained from Gibbs sampling on the extended MRF with parameter p (denoted 
MRF(/3)). The results of such comparisons for the SP parameter j3 € {0.95,0.9,0.7,0.5} and the 
Gibbs sampling parameter p £ {0.4,0.5,0.7,0.9} are shown in FigureEl Comparisons are made for 
each pair {(3, p) in these sets, and over a range of clause densities a S {4.2, 4.1, 4.0.3.8, 3.6, 3.4}. For 
fairly dense formulas (e.g., a > 4.0), the general trend is that the SP(/3) biases with larger /? agree 
most closely with the Gibbs biases with p relatively smaller (i.e., p < (3). For lower clause densities 
(e.g., a = 3.4), the agreement between the SP(/3) and Gibbs biases on MRF{p) when /3 = p is 
substantially closer. 



5 Expansion arguments for random formulas 

This section is devoted to the study of properties of the MRF on random formulas. We will 
use simple random graph arguments in order to obtain typical properties of cores, as well as the 
behavior of Gibbs sampling or message-passing algorithms applied to the MRF associated with a 
randomly chosen formula. Throughout this section, we denote to denote the MRF distribution 
for a fixed formula (j)- Otherwise, we write P*^'™ for the uniform measure on k-sat formulas with n 
variables and m clauses, and P"'-" for the uniform measure on k-sat formulas with n variables and 
m = an clauses. We often drop n, m, and/or a when they are clear from the context. Finally, we 
use E^, E'^'™ and E"'° to denote expectations with respect to the distributions p^, P"'*^ and P"'" 
respectively. 



5.1 Size of cores 

We first prove a result that establishes that cores, if they exist, are typically at least a certain linear 
fraction c(a, k) of the total number n of variables. 

Proposition 8. Let cp be a random k-sat formula with m = an clauses where k > 3. Then for all 
positive integers C it holds that 

P"'°[ p) has a core with C clauses ] < I jr:^ — j , (18) 

Consequently, if we define c{a,k) := (ae^)^^/^^^^-*, then with ¥"'°' -probability tending to one as 
n +00, there are no cores of size strictly less than c{a, k) n. 

Proof. Suppose that the formula (p has a core with C clauses. Note that the variables in these 
clauses all lie in some set of at most C variables. Thus the probability that a core with C clauses 
exist is bounded by the probability that there is a set of C clauses all whose variables lie in some 
set of size < C. This probability is bounded by 



m\ / n\ / C 
C [c [n 



Ck 



which can be upper bounded by 



/em\C /en\C fC\^'' fe^aC^-^^^ 



\CJ \C) \n J \ 
as needed. □ 
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5.2 (Meta)-stability of the all * assignment for small p 

By definition, the extended MRF for p = 1 assigns positive mass to tlie all-* vector. Moreover, 
Proposition IS] implies that the size of cores (when they exist) is typically linear in n. It follows that 
the state space of the MRF for /? = 1 typically satisfies one of the following properties: 

• Either the state space is trivial, meaning that it contains only the all * state, or 

• The state space is disconnected with respect to all random walks based on updating a small 
linear fraction of the coordinates in each step. 

The goal of this section is to establish that a similar phenomenon persists when p is close to 1 (i.e., 
when 1 — /? is small). 

We begin by introducing some notions from the analysis of the mixing properties of Markov 
chains. Let T be a reversible chain with respect to a measure p on a state space f]. For sets 
A,B (ZQ., write 

qT{A,B)= ^ p{x)T^^y= ^ p{y)Ty^a:- 

x€A,yeB xeA,yeB 

The conductance of the chain T is defined as 

sen p{S)[l-p{S)) 

It is well-known that c(T)/2 is an upper bound on the spectral gap of the chain T and that 2/c(T) 
is a lower bound on the mixing time of the chain. We note moreover that the definition of T implies 
that for every two sets A,B it holds that qx{A,B) < m.m{p{A) , p{B)} . 

Definition 9. Consider a probability measure p on a space 0, of strings of length n. Let T be a 
Markov chain on Vt. The radius ofT denoted by r[T) is defined by 

r{T) := snv{dH{x, y) : T^,y > 0}, (19) 

where dn is the Hamming distance. We let the radius r -conductance of p denote by c{r,p) be 

c{r,p) := sup{c(T) : T is reversible with respect to p and r{T) < r}. (20) 

Now returning to the random fe-SAT problem, we write pp for the measure pw = p^r with 
uj^ = p and Uo = 1 — P- 

Proposition 10. Consider a randomly chosen k-S AT formula with density a. Then there exists 
a po £ (0, 1) such that if p > po then F'^[(p G An U Bn] ^1 as n ^ +oo where An and Bn are the 
following events: 

(I) An consists of all the formulas (j) satisfying pf[n — n^,{x) < 2^ (1 — p) n] > 1 — exp(— r2(n)). 

{11} Bn consists of all the formulas (j) for which the measure pf satisfies c{-\/ (1 — p) n,pp) < 
exp(— il(n)). 

Proof. We let 5 be a small positive number to be determined, and set 1 — p = 5^. As it suffices to 
work with ratios of probabilities, we use the unnormalized weight W^^x) instead of p^(x). 
The proof requires the following: 
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Lemma 11. Let d he an integer satisfying 6n < d < 26n. For 6 sufficiently small, it holds that 
with P" probability going to 1 as n —> cxd 

^SkJ^l|^Z^=exp(-l^(n)). (21) 

Proof. See Appendix ICll □ 

To establish the proposition, it suffices to show that for any formula (j) for which equation (|21j) 
of Lemma ^2 is valid, then one of either condition (I) or condition (II) must hold. 

(i) First suppose that W'f'i (x) > 26n] < /93«/2. In this case, condition (I) in the statement 
of the proposition follows immediately. 

(ii) Otherwise, we may take W^ln — n^{x) > 25n] > p^"/^. In this case, we can apply the 
conductance bound in order to bound the gap of any operator with radius < Sn. Take the set 
A to be all x with n — n*(x) < Sn and B be the set of all x with 5n < n — n^{x) < 25n. Let 
T be any Markov chain with radius 5n that is reversible with respect to pw- Then we have 
qT{A, A"") = qriA, B) < p{B). In addition, it holds that W'f'[n - n^,(x) < 5n\ > (since if x 
is the all-* assignment, we have W'^{x) = p"); moreover, if we take n sufficiently large, then 
we have T^''^[5n < n — n,,(x) < 25n] < p^" by Lemma ITTl Combining these inequalities, we 
obtain that the conductance of T is bounded above by 

q{A,A^) ^ p{B) 



p{A)p{A'') - p{A)p{A'') 

W't'[6n < n-n*(x) < 25n] 



< 



< 



W'f'ln - n^=(x) < 6n]W't'[n - n*(x) > 26n] 

Jin 

-P = n«/2 

3n F ' 

p" p 2 



which implies condition (II). 



□ 



5.3 Message-passing algorithms on random ensembles 

The analysis of the preceding section demonstrated that for values of p close to 1, any random 
sampling technique based on local moves (e.g., Gibbs sampling), if started at the all * assignment, 
will take exponentially long to get to an assignment with more than a negligible fraction of non-*. 
This section is devoted to establishing an analogous claim for the belief propagation updates on the 
extended Markov random fields. More precisely, we prove that if p is sufficiently close to 1, then 
running belief propagation with initial messages that place most of their mass on on * will result 
assignments that also place most of the mass on *. 

This result is proved in the "density-evolution" setting [e.g.,^^ (i-e., the number of iterations 
is taken to be less than the girth of the graph, so that cycles have no effect). More formally, we 
establish the following: 
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Theorem 12. For every formula density a > 0, arbitrary scalars e" > and 6 > 0, there exists 
p' < 1, e' £ (0,e") and 7 > such that for all p £ {p',l] and e G (0, e'), the algorithm SP{p) 
satisfies the following condition. 

Consider a random formula (j), a random clause b and a random variable i that belongs to the 
clause b. Then with probability at least 1 — 5, if SP{p) is initialized with all messages ffi^j < e, 
then the inequality r]j^_^- < e' holds for all iterations t = 0, 1, . . . , 7 log n. 

The first step of the proof is to compare the SP iterations to simpler "sum-product" iterations. 

Lemma 13. For any p G [0, 1], the SP{p) iterations satisfy the inequality: 



Since our goal is to bound the messages ?7*1^\ , Lemma El allows us to analyze the simpler 
message-passing algorithm with updates specified by: 



The next step is to bound the probability of "short-cycles" in the computation tree correspond- 
ing to the message-passing updates specified in equation (|22|) . More formally, given a formula (p, 
we define a directed graph G{(p) = {V, E), in which the vertex set V consists of messages r]a^i- The 
edge set E includes the edge r/a^i — > Vb^j belongs to E if and only if j G V{a)\{i} and b G C^{i). 
In words, the graph (?(</>) includes an edge between the r]a^i and if the latter is involved in 
the update of rja^i specified in equation (|22|) . 

Lemma 14. Let G{(j)) be the random graph generated by choosing a formula (p uniformly at random 
with an clauses and n variables. Let v be a vertex of G{(j)) chosen uniformly at random. For all 
clause densities a > 0, there exists 7 > such that with probability 1 — o(l), the vertex v does not 
belong to any directed cycle of length smaller than 7 log n in G{(f)) . 

Proof. The proof is based on standard arguments from random graph theory [e.g.. I21j. □ 

Our analysis of the the recursion (|22|) on the computation tree in based on an edge exposure 
technique that generates a neighborhood of a vertex v in the graph G((^) for a random (j). More 
specifically, pick a clause a and a variable z in o at random. Now for each variable j G y(a)\{i}, 
expose all clauses h containing j (but not any other of the variables appearing so far). Then for 
each such b, we look at all variables k G y(6)\{j}, and so on. We consider the effect of repeating 
this exposure procedure over t = 7logn steps. When the vertex rja^i does not belong to cycles 
shorter than t in G{(j)), such an analysis yields a bound on 

Note that each clause can expose at most k — 1 variables. Recall that we generate the formula (j) 
by choosing each of the = 2'^(^) clauses with probability an/Nc. The distribution of the number 
of clauses exposed for each variable is thus dominated by Bin(Mc, an/Nc) where = 2^^ (/-"i) • An 
equivalent description of this process is the following: each vertex v = rja^i exposes neighbors 




Proof. See Appendix lC.2l 



□ 




(22) 
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77b->j, where the distribution of the collection {Xy} is dominated by a collection {Y^} of i.i.d. random 
variables. Moreover, the y's are jointly distributed as the sum of A: — 1 i.i.d. Iim{Mc,an/Nc) 
variables. 

The proof requires the following lemma on branching processes. 

Lemma 15. Consider a branching process where each vertex gives birth to Y children. Assume 
further that the branching process is stopped after m levels and let K > be given. 

The notion of a good vertex is defined inductively as follows. All vertices at level m are good. 
A vertex at level m — 1 is good if it has £ children and £ < K. By induction for s > 2 we call a 
vertex at level m — s good if v has £ children vi, . . . ,V£ with £ < K and 

(a) Either all of vi, . . . ,vi have at most K children, of which all are good; or 

(b) all of vi, . . . ,vi have at most K children, of which all but one are good. 
Denote by p{m, K) the probability that the root of the branching process is good. Then 

inf p{m, K) = l- e^p{-n{K)). 

0<m<oo 

Proof. See Appendix l(y.3l □ 

We are now equipped to complete the proof of Theorem 1121 Using Lemma I14| first choose 
7 = 7(a) such that a random vertex in G{(j)) does not belong to cycles shorter than 7logn with 
probability 1 — o(l). Next use LemmaElto choose K such that the probability info<m<oo ^("1-; -f^) 
that the root of the branching process is good is at least 1 — 6/2. 

Next we define a pair of functions 9 and C (each mapping Rx Rto the real line) in the following 
way: 

e{e, p) := ((1 -p)+ Kpe), ({e, p) ■= {9{e, p),p)xe (0(e, pf,p) . 

Setting e' := min(e", 2^)) observe that 9{e', 1) = Ke' and therefore 0^{e', 1) < j and 

C(e', 1) = e{Ke', l)e{{Ke'f, 1) = {K\'){K^e''^) = K\''^ < j. 
It now follows by continuity that there exists p' < 1 such that for all 1 > /9 > p' it holds that 

e\e\p)<^^, C(e',p)<f (23) 

We claim that the statement of the theorem holds with the choices of 7, e' and p' above. Indeed, 
choose a formula (p with density a at random and let v = rja^i be a random vertex of G{(f)). With 
probability at least 1 — 5/2, the vertex v does not belong to any cycle shorter than t = 7logn. 

Since v does not belong to any such cycle, the first t levels of the computation tree of v may 
be obtained by the exposure process defined above. We will then compare the computation tree to 
an exposure process where each variable gives birth to exactly Bin{Mc,an/Nc) clauses. Since the 
messages are generated according to (|22j) . any bound derived on the values of non-* messages for 
the larger tree implies the same bound for the real computation tree. 

We now claim that if u is a good vertex on that tree, then the message at v after t iterations — 
namely, — -is at most e'. Since a vertex of the tree is good with probability 1 — 6/2, proving 
this claim will establish the theorem. 
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We prove this claim by induction on s, where m — s is the level of w. For s = 0, the claim 
follows immediately from the initialization of the messages. For s = 1, observe that equation (|22|) 
implies that li w = rjh^j is good at level m — 1, then 

For the general induction step, assume that w = rjh^j at level m — s is good and s > 2. There are 
two cases to consider: 

(i) w has all its grand children good. In this case we repeat the argument above twice to obtain 

(ii) Exactly one of w = ij^^j grand children is not good. Let y' = rj^i'^ii denote the grand-child 
and y = rjd^i denote y parent. Then by equation (|22|) : 



< {I- p) + Kpe' = e{e',p). 

Using ((T^ again yields 

Vd^E < {{l-p) + Kpe{e',pMl-p)+Kpe\e',p)f-' 

< ((1 -p)+ Kpe{e\ p))((l -p) + Kpe\e\ p)) = C(e', p) < e' /2, 

which completes the proof. 



6 Conclusion 

The survey propagation algorithm, recently introduced by Mezard, Parisi and Zecchina [^S] for 
solving random instances of A;-SAT problems, has sparked a great deal of excitement and research 
in both the statistical physics and computer science communities [e.g.,ini[31IZ101IlllSllE31l^- This 
paper provides a new interpretation of the survey propagation algorithm — namely, as an instance 
of the well-known belief propagation algorithm but as applied to a novel probability distribution 
over the partial satisfiability assignments associated with a /c-SAT formula. The perspective of 
this paper reveals the combinatorial structure that underlies survey propagation algorithm, and 
we established various results on the form of these structures and the behavior of message-passing 
algorithms, both for fixed instances and over random ensembles. 

The current work suggests various questions and open issues for further research. As we de- 
scribed, associated with any /c-SAT problem is a large family of Markov random fields over partial 
assignments, as specified by the parameter p (or more generally, the parameters oJq and a;^=). Further 
analysis of survey propagation and its generalizations requires a deeper understanding of the fol- 
lowing two questions. First, for what parameter choices do the marginals of the associated Markov 
random field yield useful information about the structure of satisfiability assignments? Second, for 
what parameter choices do efficient message-passing algorithms like belief propagation yield accu- 
rate approximations to these marginals? Our results show that the success of SP-like algorithms 
depends on a delicate balance between these two factors. (For instance, the marginals of the uniform 
distribution over SAT assignments clearly contain useful information, but belief propagation fails 
to yield good approximations for sufficiently large clause densities.) More generally, these questions 
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fall in a broader collection of issues, all related to a deeper understanding of satisfiability problems 
and especially the relationship between finite satisfiability problems and their asymptotic analysis. 
Given the fundamental role that satisfiability plays in diverse branches of computer science, further 
progress on these issues is of broad interest. 
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A Belief propagation on a generic factor graph 

Given a subset S* C {1, 2, . . . , n}, we define xs '■= {xi \ i G S}. Consider a probability distribution 
on n variables xi, X2, ■ ■ ■ , x„, that can be factorized as 

1 " 

p{xi,X2,...,Xn) = — '[lMxi)Y[lpaixv{a)), (24) 
i=l aeC 

where for each a £ C the set V{a) is a subset of {l,2,...n}; and 'tpi{xi) and iJa{xv{a)) non- 
negative real functions, referred to as compatibility functions, and 



n 



Z =Y.[II ^^(^*) n M^via))] (25) 

X i=l aeC 

is the normalization constant or partition function. A factor graph representation of this probability 
distribution is a bipartite graph with vertices V corresponding to the variables, called variable nodes, 
and vertices C corresponding to the sets V{a) and called function nodes. There is an edge between 
a variable node i and function node a if and only if i £ V{a). We write also a € C{i) if z G V{a). 

Suppose that we wish to compute the marginal probability of a single variable i for such a 
distribution, as defined in equation ((S)). The belief propagation or sum-product algorithm is an 
efficient algorithm for computing the marginal probability distribution of each variable, assuming 
that the factor graph is acyclic. The essential idea is to use the distributive property of the sum and 
product operations to compute independent terms for each subtree recursively. These recursions 
can be cast as a message-passing algorithm, in which adjacent nodes on the factor graph exchange 
intermediate values. Let each node only have access to its corresponding compatibility function. 
As soon as a node has received messages from all neighbors below it, it can send a message up 
the tree containing the term in the computation corresponding to it. In particular, let the vectors 
Mj_»a denote the message passed by variable node i to function node a; similarly, the quantity 
Ma^i denotes the message that function node a passes to variable node i. 

The messages from function to variables are updated in the following way: 

Ma^,{Xi) OC [V'a(xy(,)) J] M,^a{Xj)\. (26) 

^V{a)\{^] j&V{a)\{i} 

In the other direction, the messages from variable nodes to function nodes are updated as follows 

Mi^a{xi) OC i^iixi) JJ Mi,^i{xi). (27) 

b&C{i)\{a} 
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It is straightforward to show that for a factor graph without cycles, these updates will converge 
after a finite number of iterations. Upon convergence, the local marginal distributions at variable 
nodes and function nodes can be computed, using the message fixed point M, as follows: 

Fi{xi) oc tlJi{xi) Jl Mh-,i{xi) (28a) 

Fa{xv(a)) OC llJa{xv{a)) W Mj^a{Xj). (28b) 

j€V(a) 

The same updates, when applied to a graph with cycles, are no longer exact due to presence of 
cycles. An exact algorithm will generally require exponential time. For certain problems, including 
error-control coding, applying belief propagation to a graph with cycles gives excellent results. Since 
there are no leaves on graphs with cycles, usually the algorithm is initialized by sending random 
messages on all edges, and is run until the messages converge to some fixed value |24j . 

B Derivation of BP updates on the extended MRF 
B.l Messages from variables to clauses 

We first focus on the update of messages from variables to clauses. Recall that we use the notation 
Pi = S U {a} as a shorthand for the event 

aePi and S = Pi\{a} C C^{i), 

where it is understood that S could be empty. 

Lemma 16 (Variable to clause messages). The variable to clause message vector Mi^a is fully 
specified by values for pairs (xj. Pi) of the form: 

{iSa,^,SU{a}), {Sa,i,(k^Pi<^Ci(i)), {Ua,i,(k ^ Pi <^ C^(i)) , (Sa,„0), (^/„,i,0), (*,0)}. 

Specifically, the updates for these five pairs take the following form: 



Mi^a{Sa,i, Pi = SU{a}) 




n 


n n ^b-.^ 


(29a) 




bes 








Mi^aiSa,i, (H^PiC Ci{i)) 




n ^4*- 




(29b) 












Mi^a{Ua,^, ^ ^ Pi C C^ii)) 




n 


-.i n ^b^i 


(29c) 




beP^ 


b€C!t{i)\P, 


bdC^ii) 




Mi^aiSa,i, Pi=^) 


= H 


n 




(29d) 




beC-ii) 








Mi^a{Ua,i, Pi = 0) 


= Wo n 


n 




(29e) 




6GC-(i) 


b&Cg{i) 






Mi_a(*,Pi =0) 








(29f) 



beC(i)\{a} 

Proof. The form of these updates follows immediately from the definition (|1U|) of the variable 
compatibilities in the extended MRF, and the BP message update (HTJ. □ 



32 



B.2 Forms of R quantities 

In this section, we compute the specific forms of the hnear sums of messages defined in equation ()14[). 
First, we use the definition (|14a() and Lemma [TBI to compute the form of Rf^a- 

RUa ■■= E Mi_a{Sa,,,Pi = SU{a]) 

= E n^^- n ^4*-.. n 

5CQ(i) b&S beC^{i)\S bec^ii) 

= n n iMu+M:_^,)]. 

bdcxii) b&cid) 

Similarly, the definition H14b|) and Lemma ITCl allows us compute the following form of R^_^^- 

E n^^^- n ^b^^ n Mb^^+^o n n ^^^^ 

scc]i(i),s^<Dbes b<^cs(i)\s b(^c^(i) 6GC^(i) b€C^(i) 

= n n (^6^.+^6-..)-(i-^o) n ^M- 

Finally, we compute R*^a using the definition (|14c() and Lemma IT^ 

RUa = [ E Mi^a{Sa,^,Pi = S)\+Mi^ai*,Pi=9) 

= [ E n^^- n n ^^j+^o n n ^fe- 

+ n ^fe- n ^b^i 

= n n (M,^-.^+M,*_^j-(i-^o) n n ^ft*-- 

B.3 Clause to variable updates 

In this section, we derive the form of the clause to variable updates. 

Lemma 17 (Clause to variable messages). The updates of messages from clauses to variables 
in the extended MRF take the following form: 

M^^, = n (30a) 

jeV{a)\{i} 

= n (RUa + RUa) + E (RUa -RUa) U RUa- U R^a (30b) 

ieV(a)\{i} keV{a)\{i} jeVia)\{i,k} j&V{a)\{i} 

MUi = n (RUa + RUa)- n ^Ua- (30c) 

ieV(a)\{i} j€V{a)\{i} 
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Proof, (i) We begin by proving equation ()3Ua|) . When Xi = Sa,i and Pi = 5u{a} for some S C C^{i), 
then the only possible assignment for the other variables at nodes in V{a)\{i} is xj = Uaj and 
Pj ^ C^{j). Accordingly, using the BP update equation (j^U, we obtain the following update for 

= Ma^^{Sa,^,P^ = S U {a}): 

= n E M,^a{Ua,j,Pj) 

jev{a)\{t} P.cc^U) 

n 

jeV{a)\{z} 

(ii) Next we prove equation (|3flcj) . In the case Xi = * and Pi = 0, the only restriction on the 
other variables {xj : j £ ^(«)\{^}} is that they are not all unsatisfying. The weight assigned to 
the event that they are all unsatisfying is 

E n M,^a{Ua,j,S,) = n [ E M,^a{Ua,„S,) 

= n ^J^a- (31) 

j<-V(a)\{i} 

On the other hand, the weight assigned to the event that each is either unsatisfying, satisfying or 
* can be calculated as follows. Consider a partition J" U J* U J* of the set V{a)\{i}, where J", J* 
and J* corresponds to the subsets of unsatisfying, satisfying and * assignments respectively. The 
weight W{J^ , , J*) associated with this partition takes the form 

E E H Mj^aiUa,j, S,) n M,^aiSa,j,Sj) J] ^.-a(*, 0)- 

{s,cc:i(j) : j&j-} {sjcc^u) : je.J-} J'^-^' ^'e-^* 

Simplifying by distributing the sum and product leads to 

jeJ^ jeJ" jeJ* 

where we have used the definitions of R^^a ^j-*a Section rB.2l Now summing W{J^, J^, J*) 
over all partitions J" U J'^ U J* of V{a)\{i} yields 

E w{r,r,j*) 

J^UJ^UJ* 

E n ^"-^'^ E { n i^Ua - M,^a{*, 0)] n ^} 

j"cy(a)\{i} ieJ" j''uJ*=V{a)\{j^ui} j&J" jeJ* 

En n ^i^" 

J"Cy(a)\{j} iGJ'' jeV'(a)\{J"Ui} 

n [Rha + RUa]^ (32) 

iev(a)\{i} 
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where we have used the binomial identity twice. Overall, equations (|5T|) and together yield 
that 

ieV(a)\{i} jevia)\{i} 

which establishes equation pUcf) . 

(iii) Finally, turning to equation ()30b|l . for Xi = Ua,i and Pi C C^{i), there are only two 
possibilities for the values of xv(a)\{i} : 

(a) either there is one satisfying variable and everything else is unsatisfying, or 

(b) there are at least two variables that are satisfying or *. 

We first calculate the weight VF(yl) assigned to possibility (a), again using the BP update equa- 
tion (Unj: 

kGV{a)\{i} S^CC^ik) jeV{a)\{i,k} SJCC«(j) 

= E ^^-^ n ^l^a, (33) 

keV(a)\{i} j€Via)\{i,k} 

where we have used the definitions of R^^^ and R^^^ Section IB.2I 

We now calculate the weight W{B) assigned to possibility (b) in the following way. From our 
calculations in part (ii), we found that the weight assigned to the event that each variable is either 
unsatisfying, satisfying or * is njGV{a)\{j} [^j-^a+-^j->a] ■ '^^^ weight W{B) is given by subtracting 
from this quantity the weight assigned to the event that there are not at least two * or satisfying 
assignments. This event can be decomposed into the disjoint events that either all assignments 
are unsatisfying (with weight njev(a)\{i} -^j-^a from part (ii)); or that exactly one variable is * or 
satisfying. The weight corresponding to this second possibility is 

[Mfc^a(*,0)+ Yl Mk^aisk,a,S'')] J] E M,^a{Uj,a, ) 

keV{a)\{i} S''=CQ(fc) jeV{a)\{i,k} S:>CC^ia} 

E ^k^^ n ^i^'^- 

k&V{a)\{i} jeV{a)\{Lk} 

Combining our calculations so far we have 

w{B) = n [R^^^+R*^^]- Y Ri^a n ^i^a- n ^i^a-im 

iGV(a)\{i} fcGV{a)\{j} j&V{a)\{i,k} j&V{a)\{i} 

Finally, summing together the forms of VF(^) and W{B) from equations ()33|) and (jSU respectively, 
and then factoring yields the desired equation (|30b|l . □ 
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C Proofs for random formulae 



C.l Proof of Lemma 1111 

In order to prove ()21() . it suffices by the Markov inequality to sliow that for every integer d in the 
interval [6n,25n], it holds that 



n — 



P 



,3ri 



exp(— il(n)). 



(35) 



To establish consider a fixed set of d variables. The average VF-weight assigned to the event 
that this set of size d constitutes all the non-star variables is bounded by 



r=0 \ / \ 



d\ ( an 
r 



(d/n) 



kr 



where r represents the number of constrained variables. We obtain this bound by the following 
reasoning. First, the n — d variables assigned * all receive weight p. Otherwise, if r out of the 
remaining d variables are constrained, there must be r clauses chosen from a total of an, and each 
such clause must have all of its k variables chosen from within the set of d non-star variables. 
Consequently, the total probability of having d non-star variables is bounded by 



."i:)B'-)-(:) 



d\ ( an 
r 



r=0 ^ ^ 



aen's^^ f d\ 
n J 



kr 



n-d ( ( 1 - 

d 



d d 

E 

T- = 



2^k+l 



a 



r2(l — p)n^ ^ 



Recalling that 1 — p = and d E [6n, 25n], we obtain that the last expression is at most 



n—25n 



Sn 



r=0 



r=0 

2Sn / 2k+l^^k~l„2^2\ ^ 



n e 



r=0 

where the final inequality is valid when 5e < 1. A straightforward calculation yields that the 
function g{r) := ^2^liiM^J^^^ jg maximized at r* = V2^+^'a5^~^n and the associated value is 
g{r*) = e^*" . Consequently, the sum above is bounded by 

5n 



25n 



6 exp I 1 -|- 



2r* 
6n 



= 26np 

< 25np"-2'5" \6 exp f 1 + V¥+3 



Sn 



'a 



Sn 
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This expression is exponentially smaller than p'^"' for large n if 



6 exp (l + V¥+3^)Y < = {1 - 6^f . (36) 



Inequality (|36() holds for sufficiently small 6 > 0, which establishes the lemma. 
C.2 Proof of Lemma [HI 

It win be useful to denote n6eQ(i)(l ~ '^^'^») ^^(^) n6GC^(i)(l - by Pu{j)- With this 

notation, the j'th term in © is given by 

nj-.a (l-pP„(i))P,(i) 



n«_^„ + nj^„ + n*^, (i - pPuUmU) + (i - Ps{j))Pu{j) + Ps{j)Pu{j) 

Ps{j)+Pu{j)-pPs{j)Pu{j) - ^ "^-^^^ 

We therefore conclude that 

jeV{a)\{i} 

On the other hand, we have Pu{j) = YlbeCs{i)(^ ~ ^b^i) ^ ^ ~ Y.beC^{i) ; so that 

1-P^«(j)<min ^ 



This yields the bound 7?^+ • < njgv(a)\{i} (l' (1 " P) + Ebec^(i) ^b-^j j ^ from which equa- 
tion (122) follows. 

C.3 Proof of Lemma 1151 

We start by estimating the probability that a vertex is bad by induction. Let qk denote the 
probability that v has more than K children, or that one of v's children has more than K children. 
Clearly, 

9K<{K + 1)¥[Y >K]<{K + l){k- l)P[Bin(M„ — ) > -] < exp{-n{K)). (37) 

iVc fe — 1 

Write q{m, K) = 1 — p{m, K) and note that g(0, K) = and (/(I, K) < gx- By induction, A vertex 
can be bad for two reasons: it has two many descendants in the two levels below it, or it has 2 bad 
descendant in the two levels below it. We may thus bound the probability of a vertex being bad as 

q{s, K) < QK + P[Bin(K2, q{s - 2, K)) > 2]. (38) 

Note also that 

¥[Bm{K^, q{s - 2, K)) > 2] < K'^q{s - 2, K f. (39) 
Combining (|nH|l and yields 

qis,K)<gK+K^qis-2,Kf. (40) 
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By p7|l when K is sufficiently large K^{2gK)'^ < Qk- Thus when K is sufficiently large, it follows 
from equation (|4()|) that 

q{s,K) < 2gK 

for all s. Finally when K is sufficiently large p(s, K) > 1 — 2gK for all s and 1 — 2gK > 1 — 
exp(— r2(K)) as needed. 
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